Newest 'scikit-learn machine-learning preprocessing' Questions

1vote

1answer

415views

Getting equal distributions of data from different input sets

I am new to ML. I am trying to create a training dataset that is equally distributed between multiple lists, each of which have a different kind of data. How can I do this? I looked into ...

user81371

11

asked Jul 29, 2022 at 4:08

0votes

2answers

717views

Dynamic creation of sklearn pipeline

I am trying to create an automatic pipeline builder functionality that takes into account a large set of conditions such as the existence of missing values, the scale of numerical features, etc., and ...

lazarea

299

asked Feb 25, 2022 at 15:37

1vote

1answer

626views

scikit-learn OneHot returns tuples and not a vectors

First I do a label encoding to all the columns that are strings so they will be numeric. After that, I take just the columns with the labels, convert them to np array, reshape, and convert them to one-...

JamseGoldman

23

asked Sep 25, 2021 at 16:52

1vote

0answers

43views

Should the test dataset be scaled with respect to its distribution or with respect to the distribution of the training dataset? [duplicate]

I have applied data scaling techniques on my training dataset during training. For evaluation, when scaling the test dataset, should it be scaled using the scalers fitted to the training dataset or ...

NIM4

11

asked Jun 30, 2021 at 1:24

0votes

1answer

84views

How should a stateless data transformation be applied in regard to train/test split?

I want to apply spatial sign transformation to my data, but unlike other transformations this one is stateless. I am using sklearn and normallly i would first use ...

Mateusz

135

asked Jun 14, 2021 at 13:17

1vote

1answer

980views

How to impute missing value in Test Set using a custom Imputer created on training dataset

I am working on a toy project to predict claims. One of the input features has null values on which I have applied a custom imputation technique. Under this technique, I replaced missing values with ...

tanmay

173

asked Jan 18, 2021 at 5:05

1vote

1answer

199views

How to control for Co-variate shift in test data set compared to train data for regression task?

I am working on a regression project. But I am facing the problem of covariate shift in features due to time delay.Test data was collected a year later due to which there has been some change in ...

saurabh kumar

31

asked Oct 23, 2020 at 14:14

8votes

1answer

15kviews

Encoding with OrdinalEncoder : how to give levels as user input?

I am trying to do ordinal encoding using: from sklearn.preprocessing import OrdinalEncoder I will try to explain my problem with a simple dataset. ...

Ayush Ranjan

411

asked Apr 15, 2020 at 0:25

2votes

1answer

95views

Is it compulsary to normalize the dataset if doing so can negatively impact a Binary Logistic regression performance?

I am using raw data set with 4 feature variables to do a Binominal Classification using Logistic Regression Algorithm. I made sure that the class counts are balanced. i.e., an equal number of ...

GYSHIDO

133

asked Sep 9, 2019 at 12:55

0votes

0answers

262views

Pre-processing data to make predictions on deployed Sklearn model

I am new to Machine Learning. I have trained a ML model on the Diamond Prices Dataset to predict the price of a diamond given it's features (carat, cut color, clarity, etc...) I have used pickle to ...

Kag Tes

1

asked Jun 18, 2019 at 13:06

3votes

1answer

1kviews

Python - Create many dummy variables from one text variable?

I'm trying to create dummy variables for a variable that has text data in rows. Data in 1st row is: ...

Naveen Reddy Marthala

325

asked Mar 28, 2019 at 14:29

62votes

4answers

58kviews

Difference between OrdinalEncoder and LabelEncoder

I was going through the official documentation of scikit-learn learn after going through a book on ML and came across the following thing: In the Documentation it is given about ...

Saurabh Singh

763

asked Oct 7, 2018 at 18:55

51votes

3answers

74kviews

StandardScaler before or after splitting data - which is better?

When I was reading about using StandardScaler, most of the recommendations were saying that you should use StandardScaler before ...

tsumaranaina

725

asked Sep 18, 2018 at 2:35

0votes

1answer

403views

What is the best way to normalize histogram vectors to get distribution?

l have the following sample of 4 vectors of dimension 5 . They are sparse vectors, in a way that each value in a vector represent the frequency (number of occurrence of values). For instance v_1=[0,4,...

Joseph

225

asked Dec 4, 2017 at 18:06

1vote

2answers

404views

Pre-process data images before training OneClassSVM and decrease number of features

I want to train a OneClassSVM() using sklearn, and I have a set of around 800 images in my training set. I am using opencv to read the images and resize them to constant dimensions (960x540) and then ...

riadrifai

139

asked Nov 4, 2017 at 20:37

Stack Exchange Network

All Questions

Getting equal distributions of data from different input sets

Dynamic creation of sklearn pipeline

scikit-learn OneHot returns tuples and not a vectors

Should the test dataset be scaled with respect to its distribution or with respect to the distribution of the training dataset? [duplicate]

How should a stateless data transformation be applied in regard to train/test split?

How to impute missing value in Test Set using a custom Imputer created on training dataset

How to control for Co-variate shift in test data set compared to train data for regression task?

Encoding with OrdinalEncoder : how to give levels as user input?

Is it compulsary to normalize the dataset if doing so can negatively impact a Binary Logistic regression performance?

Pre-processing data to make predictions on deployed Sklearn model

Python - Create many dummy variables from one text variable?

Difference between OrdinalEncoder and LabelEncoder

StandardScaler before or after splitting data - which is better?

What is the best way to normalize histogram vectors to get distribution?

Pre-process data images before training OneClassSVM and decrease number of features

Hot Network Questions

All Questions

Related Tags